Data Dependent Distance Metric for Efficient Gaussian Processes Classification
نویسنده
چکیده
The contributions of this work are threefold. First, various metric learning techniques are analyzed and systematically studied under a unified framework to highlight the criticality of data-dependent distance metric in machine learning. The metric learning algorithms are categorized as naive, semi-naive, complete and high-level metric learning, under a common distance measurement framework. Secondly, the connection of feature selection, feature weighting, feature partitioning, kernel tuning, etc. with metric learning is discussed and it is shown that they are all in fact forms of metric learning. Thirdly, it has been shown that the realm of metric learning is not limited to k-nearest neighbor (k-NN) classification, and that a metric optimized in the k-nearest neighbor setting is likely to be effective and applicable in other kernel-based frameworks, for example Support Vector Machine (SVM) and Gaussian Processes (GP) classifiers. We support our hypotheses by tuning the length-scale parameters of GP with metric learning method proposed in k-NN framework. Our empirical results on a huge range of machine learning databases suggest that a metric optimized in the framework of one learning algorithm is likely to be effective in those of others.
منابع مشابه
An efficient weighted nearest neighbour classifier using vertical data representation
The k-nearest neighbour (KNN) technique is a simple yet effective method for classification. In this paper, we propose an efficient weighted nearest neighbour classification algorithm, called PINE, using vertical data representation. A metric called HOBBit is used as the distance metric. The PINE algorithm applies a Gaussian podium function to set weights to different neighbours. We compare PIN...
متن کاملAn Information Geometry Approach for Distance Metric Learning
In this paper, we propose a framework for metric learning based on information geometry. The key idea is to construct two kernel matrices for the given training data: one is based on the distance metric and the other is based on the assigned class labels. Inspired by the idea of information geometry, we relate these two kernel matrices to two Gaussian distributions, and the difference between t...
متن کاملNon-Euclidean metrics for similarity search in noisy datasets
In the context of classification, the dissimilarity between data elements is often measured by a metric defined on the data space. Often, the choice of the metric is often disregarded and the Euclidean distance is used without further inquiries. This paper illustrates the fact that when other noise schemes than the white Gaussian noise are encountered, it can be interesting to use alternative m...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملتشخیص سرطان پستان با استفاده از برآورد ناپارمتری چگالی احتمال مبتنی بر روشهای هستهای
Introduction: Breast cancer is the most common cancer in women. An accurate and reliable system for early diagnosis of benign or malignant tumors seems necessary. We can design new methods using the results of FNA and data mining and machine learning techniques for early diagnosis of breast cancer which able to detection of breast cancer with high accuracy. Materials and Methods: In this study,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014